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Text segmentation and label assignment with user interaction by means of topic specific 
language models and topic-specific label statistics 



5 The present invention relates to the field of generating structured 

documents from unstructured text by segmenting unstructured text into text sections 
and assigning a label to each section as section heading. The text segmentation as well 
as the assignment of labels to text sections also denoted as labelling is provided to a 

user having control of the segmentation and the labelling procedure. 

** 

10 Text documents that are generated by a speech to text transcription 

process usually do not provide any structure since conventional speech to text 
transcription systems or speech recognition systems only literally transcribe the 
recorded speech into corresponding text. Explicitly dictated commands of text 
formatting, text highlighting, punctuation or text headings have to be properly 

1 5 recognized and processed by the speech recognition system or by a text formatting 
procedure being successively applied to the text generated by the speech recognition 

■ 

process. 

Both automatic speech recognition as well as automatic text formatting 
systems that are typically based on training data and/or manually designed text 

20 formatting rules inevitably produce errors because of a lack of human expertise which 
is needed to properly identify complex formatting commands, section boundaries as 
well as distinct text portions, e.g. representing a section heading. The result of an 
ordinary speech to text transcription process or text formatting process therefore has to 
be provided to a human proof reader. The proof reader has to browse through the entire 

25 document thereby gathering information about the content of the document and to 

decide whether the speech to text transcription process produced reasonable results and 
whether a text formatting has been performed correctly with respect to the content of 
the document. 

The task of the proof reader even aggravates when the structure of a 
30 document is not explicitly dictated, i.e. many headings and section boundaries are not 
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explicitly encoded in the spoken dictation. Furthermore, when even sentence structures, 
i.e. punctuation symbols, are rarely dictated, these punctuation symbols have to be 
manually inserted by the proof reader. 

Especially the partitioning of a text into sections is a rather demanding 

5 task for a proofreader because the detection of a change of a section type cannot be 
decided before a longer part of the new section has been read by the proof reader. Here 
the proofreader has to jump back to some position in the already examined text in order 
to insert a section boundary and an appropriate heading. In particular the permanent 
jumping between different positions in the document is very time consuming and 

10 exhausting for the human proof reader. 

The present invention aims to provide a method, a computer program 
product, a text segmentation system as well as a user interface for a text segmentation 
system in order to perform a segmentation and labelling of an unstructured text in ■ 
response to a user's decision. 

15 The present invention provides an efficient user interface for a text 

processing system which employs a method of segmentation of a text into text sections, 
of assigning a topic to each section, and of assigning a label in form of a section 
heading to each text section. These tasks are performed using statistical models which 
are trained on the basis of annotated training data. First, the method performs a 

20 segmentation of the text into text sections by making use of the statistical models 
extracted from the training data. After the text has been segmented into text sections, 
each text section is assigned with a topic being indicative of the content of the text 
section. The assignment of the topic to a text section makes also use of the statistical 
models extracted from the training data. After the text segmentation and the topic 

25 assignment has been performed, a structured text is generated by inserting a label as a 
section heading into the text. The label is inserted in the text at a position corresponding 
to a section border in such a way that the label is directly followed by the section it 
refers to. This inserted label is to be understood as a heading which precedes the 
following text section. 

30 When the structured text has been generated in the above described way, 

the structured text is provided to a user having control of the segmentation, the topic 
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assignment and the general structuring of the text. The method finally performs 
modifications of the structured text in response to the user's review. 

According to a preferred embodiment of the invention, the insertion of 
labels as section headings comprises a text formatting procedure incorporating 
5 formatting steps such as punctuating, highlighting, indenting and modifying the type 
face. 

According to a further preferred embodiment of the invention, the topic 
assignment to a text section also comprises the assignment of a set of labels to the text 
section. One label of the set of assigned labels is finally inserted as a section heading 

1 0 into the text. Here, a topic represents a rather abstract declaration of a distinct class or 
type of section. Such a declaration is particularly applicable to so-called organized 
documents following a typical or predefined structure. For example a medical report 
features a topic sequence like demographic header, patient history, physical 
examination and medication. 

1 5 Each section of such a structured document can be identified by an 

abstract topic. In contrast to the abstract topic, a label is indicative of a concrete 
heading of such a section. For example the section referring to an examination of the 
patient can be labelled in a plurality of different ways, such as "physical examination", 
"examination", "exam", "surgical examination". No matter how a section of text is 

20 labelled, the content of the section, i.e. in this case an examination, is identified by the 
assigned topic. 



The segmentation of the text into text sections can for example be 
25 performed by a method disclosed in US Pat. No. 6,052,657 making use of language 

models and language model scores in order to indicate a correlation between a block of 
text and a language model. A more accurate and reliable procedure for text 
segmentation and topic assignment is disclosed in the patent application "text 
segmentation and topic annotation for document structuring" filed by the same 
30 applicant herewith concurrently. This document describes a statistical model for text 
segmentation and topic annotation by making explicit use of a topic sequence 
probability, a topic position probability, a section length probability as well as a text 



WO 2005/050474 



4 



PCT/IB2004/052405 



emission probability. These probabilities are especially helpful when the underlying 
annotated training data refer to organized documents. 

According to a further preferred embodiment of the invention, the 
assignment of one label of the set of labels to a text section, and inserting the one 
5 assigned label as a section heading of the text section into the text, accounts for count 
statistics based on the training data and/or explicit or partial verbalizations found at the 
beginning of a section. The count statistics reflects the observed frequency that a 
section assigned to some topic is preceded by a specific label. In this way, the most 
frequently assigned label per topic may be selected as a default heading if no other hints 

1 0 about the most suitable label or heading are found in the text. In other words by means ■. 
of a count statistic a default label is assigned to a text section. 

Alternatively, the label assignment based on the count statistic is 
overruled when an explicit verbalization is found at the beginning of a section exactly 
matching one of the set of labels being assigned to the section. Furthermore, if no label 

1 5 matches exactly with an explicit verbalization found at the beginning of a section, then 
a label matching only partially some verbalization found at the beginning of the section 
may be inserted instead of the default label. The assignment of one label to a text 
section, i.e. the selection of one label of the set of labels being assigned to the text 
section, can also be performed- with respect to the count statistics based on the training 

20 data in combination with explicit full or partial verbalizations found at the beginning of 
a section. 

According to a further preferred embodiment of the invention, if some 
full or partial verbalization is found at the beginning of the section, this verbalization 
may be removed from the section. This is useful, if the verbalization represents an 

25 explicitly dictated heading which is replaced by the inserted label. As an example, a 
section starting with "medications the patient takes . . ." can be assigned to the label 
"medications". Since this label serves as a heading for the subsequent section, the term 
"medications" itself should be removed from the text of the section leaving the proper 
content of the section starting with "the patient takes ..." Modifications of this strategy 

30 include the removal of some predefined filler words which may be part of the dictated 
heading or initial phrase of some section, even if these filler words are not part of the 
label, e.g. if some section starts with "medications are X, Y, Z, ..." which is converted 
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into the heading "medications" followed by the list of medications "X, Y, Z, . . ." where 
the filler word "are" is skipped. 

According to a further preferred embodiment of the invention, the 
insertion of a section heading into the text e.g. due to an exact matching between an 
5 explicit verbalization and a label can be overruled by the user. In this case, the insertion 
is reversed by the method and the original text portion is restored. More specifically, if 
some section-initial words have been removed due to a match with the assigned label, 
these words have to be re-inserted when the user decides for a different label which 
does not match these removed words. 

1 0 According to a further preferred embodiment of the invention, the 

providing of the structured text to a user further comprises providing of the complete 
set of labels being assigned to each text section. Since each label of the set of labels 
represents an alternative for the section heading, the user can easily compare the 
automatically inserted section heading with alternative headings. 

1 5 According to a further preferred embodiment of the invention, the 

providing of the structured text to a user further comprises providing indications of 
alternative section borders. In this way not only the section borders automatically 
inserted into the text by the present method are visible to the user, but also alternative 
section borders are provided to the user for an easier and facilitated proof reading. In 

20 this way the proof reader's task to find the correct section borders of the document is 
reduced to the retrieval of automatically inserted section borders and alternative section 
borders. 

According to a further preferred embodiment of the invention, 
modifications of the structured text in response to the user's review refer to the 
25 modification of the segmentation of the text into text sections and/or modifications of 
the assignment between labels and text sections. Furthermore modifications of 
performed formatting such as punctuation, highlighting and the like are also 
conceivable. 

According to a further preferred embodiment of the invention, 
30 modifications of the text segmentation and modifications of the assignment of labels to 
text sections performed in response to the user's review are initiated by the user 
selecting one of the provided labels or one of the alternative section borders. The 
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modification selected by the user is then performed by the present method, replacing a 
section heading by a selected section heading, or shifting a section border. 

Accomplishing a first text modification may imply that a second text 
modification has to be performed. For example when the section headings are 
5 enumerated, the removal of a text section requires a re-enumeration of the successive 
text sections or section labels. Therefore, the present invention is further adapted to 
dynamically perform modifications that are due to a prior modification performed in 
response to the user's review. 

■ 

According to a further preferred embodiment of the invention, a 
10 modification of the assignment of a label to a text section as a section heading is 

performed in response of the user either selecting one label of the provided set of labels 
being assigned to the text section or by entering a user defined label and assigning this 
user defined label as section heading to the text section. In this way the user can quickly 
and effectively identify one label of the provided set of labels as the correct section 
1 5 heading or alternatively define a previously unknown heading to the relevant text 
section. . 

The selection of one label of a set of a labels as well as the entering of a 
label is not restricted to positions in the text that were identified as section borders but 
moreover an appropriate set of labels can be provided at any position in the text upon . 
20 user request. In this way the user still has complete control of structuring and labelling 
the document. 

According to a further preferred embodiment of the invention, the 
processing of modifications in response to the user's review successively triggers a re- 
segmentation of the text into text sections and a regeneration of a structured text by 

25 inserting labels as section headings referring to text sections. Both the re-segmentation 
as well as the regeneration of the structured text make use of the statistical models 
extracted from the training data and make reference to already performed modifications 
that were processed in response to the user's review. When for example a user has 
introduced a modification in the text either in the form of a redefinition of a section 

30 border or in the form of re-labelling a section heading, the method of the present 

invention performs a successive re-segmentation and regeneration of the structured text 
by leaving the initially performed modifications of the user unaltered. In this way 
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modifications introduced by the user are never overruled or re-modified by the 
inventive method. 

According to a further preferred embodiment of the invention, the re- 
segmentation of the text into sections as well as the regeneration of the structured text 
5 by inserting labels as section headings is performed dynamically during a review 

process performed by the proof reader or user. The re-segmentation of the text as well 
as the regeneration of the structured text can either be applied to all text sections, to the 
current and all following sections, or to a single section if specified by the user. For 
example when a new section boundary is introduced or when a heading is removed by 

10 the user, it is reasonable that a further restructuring or heading update is restricted to the 
current section only. In this way the method can faster respond to small, hence local 
changes that have to be introduced into the text. 

According to a further preferred embodiment of the invention, the 
granularity of the text segmentation can be controlled by the user by customizing a so- 
■ 15 called granularity parameter. In this way the user can determine whether the text is 

structured in a finer or coarser way. A change of the customizable granularity parameter 
results into removal or insertion of text sections. 

According to a further preferred embodiment of the invention, 
modifications that are performed in response to the user's review are logged and 

20 analyzed by the present method in order to further train the statistical models. In this 
way the entire method can effectively be adapted to the user's preferences. When for 
example a distinct label has been repeatedly removed from the text by the user, the 
method of text segmentation restrains to insert this distinct section heading in future 
applications. The impact of the user's modification on the adaptation of the method — 

25 hence the sensitivity of the adaptation - may be also controlled by the user. This means 
that for example an insertion or a removal of a label has to occur several predefined 
times before the method adapts to this particular user introduced modification. The 
number of how often a change has to manually inserted until the method adapts to the 
introduced change may be given by the user. 

30 Furthermore, the adaptation of the method towards user introduced 

modifications can already refer to successive sections in the present document. The 
method adapts to modifications introduced by the user in the beginning part of a 
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document and automatically performs corresponding modifications within successive 
text sections. The adaptation therefore applies to a present document as well as to future 
documents to which the inventive method is applied to. 

In the following, preferred embodiments of the invention will be 
5 described in greater detail by making reference to the drawings in which: 

Fig. 1 illustrates a flow chart of the segmentation method of the present 
invention, 

10 Fig. 2 illustrates a flow chart for text segmentation incorporating 

analyzation of user introduced modifications, 
Fig. 3 illustrates a flow chart of an implementation of the present 
invention into a speech recognition process, 
Fig. 4 shows a block diagram of the user interface of the present 

1 5 invention, 

Fig. 5 shows a block diagram of the segmentation system. . 

■ 

* » 

Figure 1 illustrates a flow chart of the text segmentation and topic 
20 assigning method. In the first step 100 an unstructured text generated e.g. by a speech to 
text transcription system is inputted. Based on the inputted text, in step 102 the method 
performs a structuring and topic assignment of the text by segmenting the text into text 
sections and assigning a topic to each text section. In order to perform the text 
segmentation and topic assignment in step 102, language or statistical models being 
25 extracted from training data are provided to step 102 by step 104. Step 105 provides a 
label count statistics indicating the probability that a label is assigned to a topic. Based 
on the training data, the label count statistics reflects how often a label is assigned to a 
topic. 

In step 106 a label is assigned to each text section as a section heading 
30 and inserted at the appropriate position into the text by making reference to the count 
statistics provided by step 105 and the segmented text provided by step 102. After the 
label assignment has been performed by step 1 06 the segmented text and the inserted 
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labels as well as alternative labels are provided to a user in step 108. Furthermore 
alternative section boundaries are provided to the user in step 108. In the successive 
step 1 1 0 the user decides whether the provided segmentation and label assignment of 
step 108 is acceptable. Alternatively the user can select alternative headings provided 
5 by step 108 or alternative segmentations provided by alternative section boundaries. 

If none of the provided alternatives satisfies the user's preferences, the 
user can also enter a section boundary as well as a section heading. In response to the 
user's decision of step 1 10, the user's decision is processed by the method in step 1 12. 
Processing of the user's decision comprises replacing inserted section headings, re- 

10 labelling successive section headings, restructuring successive or part of the document 
or restructuring and re-labelling the entire document. Furthermore a dynamical 
processing of user introduced modifications is also conceivable. Dynamic processing 
means, that a user introduced modification automatically triggers further modifications 
that are related to proceeding text sections or modifications to be performed during a 

1 5 successive application of the structuring method. 

After the user decision has been processed in step 1 12 the resulting 
modifications are performed in the following step 114. 

Figure 2 is illustrative of a flow chart of the text segmentation and text 
assignment method incorporating analyzation of user introduced modifications. In a 

20 first step 200 an unstructured text resulting from e.g. a speech to text transcription 
process, is provided to step 202. In step 202 a text segmentation into text sections is 
performed by making use of language or statistical models provided by step 204. 
Furthermore in step 202 a topic is assigned to each text section by making use of the 
statistical information stored in the language model provided by step 204. 

25 After the text has been segmented into text sections and after each text 

section has been assigned to a topic in step 202, in the proceeding step 206 a label is 
assigned to each text section as a section heading and inserted at the appropriate 
position in the text. The assignment of a label performed in step 206 makes explicit use 
of the label count statistics being provided to step 206 by step 205. Based on the 

30 training data, the label count statistics reflects how often a label is assigned to a topic. 

After the text has been structured by means of segmenting the text into 
text sections, assigning a topic to each text section and further assigning a label to each 
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text section, the segmented text, the assigned headings as well as alternatives are 
provided to a user in step 208. The alternatives provided to the user refer to alternative 
text segmentations as well as alternative section labels. In the proceeding step 210 the 
user decides whether to accept the performed segmentation of the text and the 

* 

5 performed assignment of section labels or to select one of the provided alternatives. 
Furthermore the user can also enter an arbitrary segmentation as well as an arbitrary 
section heading according to his or her preference. After the user decision of step 210, .- 
in the following step 214 the method checks whether any modifications have been 
introduced by the user. When in step 2 14 no user introduced modification has been 

10 detected the method ends in step 218 resulting in a structured and labelled text as 

performed in step 206. In contrast when in step 214 a user introduced modification has 
been detected, the method proceeds with step 212 in which the user introduced 
modifications are processed and performed. The processing and performing of a user's 
decisions incorporates a multiplicity of different text segmentation, text labelling as 

15 well as text formatting procedures. 

After the user decision has been processed and performed in step 212 the 
method proceeds with step 216. In step 216 the user introduced modifications are stored 
as external conditions for a next application of the structuring and assigning procedure. 
Depending on the type of user modification referring to the text structuring or to the 

20 label assignment of text sections after step 216, the method either returns to step 202 or 
to step 206 in which a new structuring or a new label assignment is performed. 

In a similar way a new restructuring and reassignment of the text 
performed by step 202 and 206 explicitly accounts for already performed modifications : 
provided by step 216. In this way it can be guaranteed that user performed 

25 modifications are never overruled by the text structuring step 202 and the label 
assignment step 206. 

Figure 3 is illustrative of an implementation of the text segmentation and 
topic assignment method into a speech recognition system. In step 300 speech is 
inputted into the system. In the following step 302 a first portion of the speech, p=l is 

30 selected. The first portion of speech selected by step 302 is provided to step 304 

performing a speech to text transcription by making use of a language model m. The 
language model m is provided by step 306 to step 304. After the speech portion p has 
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been transcribed into a text portion t by step 304, the resulting text portion t 
corresponding to the speech portion p is stored in step 308. In the proceeding step 310 
the speech portion index p is compared to p ma x indicating the last portion of the speech. 
If p is less than pmax, P is incremented by 1 and the method returns to step 304. The 
5 steps 304, 308 and 3 10 are repeatedly applied until the speech portion index p equals 
the last speech portion pmax. In this case the entire speech signal has been transcribed 
into text. The resulting text then comprises a plurality of text portions t corresponding 
to the portions of the speech, p. 

Based on the transcribed text, in step 3 12, a segmentation of the text into 

10 text sections is performed and each of the text sections is assigned to a topic being 

specific of the content of each section. This segmentation procedure of step 312 makes 
use of statistical models designed for text segmentation that are provided to step 312 by 
step 314. When the text has been segmented and assigned to topics in step 312, in the 
succeeding step 316, the topic assigned to each text section as well as the corresponding 

15 speech portions p' of the text sections are determined. Based on this determination, a . 
second speech recognition of the speech portions p' referring to a distinct section can be 
performed in the following step 318. Depending on the topic being assigned to a text 

» 

section, a topic specific language model for the second speech recognition is provided 

by step 306. Since the speech has been transcribed stepwise in the procedure described 
20 by the steps 300 through 3 10, a repeated speech recognition can selectively be 

performed for distinct sections of text that correspond to speech portions p\ 

WTien the repeated speech recognition step has been performed for each 

section of the text, a user can introduce further modifications referring to the 

segmentation of the text in step 320. According to the user introduced modifications of 
25 step 320, the method returns to the text segmenting step 312. Here, depending on the 

user's feedback, a new segmentation may take place and/or sections may be re-assigned 

to topics and labels. 

When the performed text segmentations of step 312 and the repeated 

speech recognition of step 3 1 8 are both accepted by the user, the method ends with step 
30 322. 

The assignment between a topic and a section performed in step 316, as 
well as the speech transcription performed by step 304, can also make explicit use of a 
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method of text segmentation and topic annotation as described in the patent application 
"Text segmentation and topic annotation for document structuring" and by the patent 
application "Topic specific models for text formatting and speech recognition" filed by 
the same applicant herewith concurrently. 
5 In this way the expertise of a human proof reader can be universally and 

effectively coupled into a text segmentation and text labelling as well as into a 
corresponding speech recognition procedure. 

Figure 4 shows a block diagram of a user interface of the present 
invention. The user interface 400 is preferably adapted as a graphical user interface. 

10 The user interface 400 comprises a text window 402 and a suggestion window 404. The 
text that has been subject to text segmentation and label assignment is provided within 
the text window 402. A label 406 that has been inserted as a section heading into the 
text is highlighted for better retrieval within the text provided in the text window 402. 
When for example the user makes use of a pointer 408, the user can select the label 406 

15 and in response to the selection of the label 406 a label list 410 is provided within the . . 
user interface. The label list 410 provides a whole set of labels 412, 414, 416 that serve 
as alternative labels that can be inserted instead of label 406 into the text. 

Additionally or alternatively the label list 410 can also be provided 
within the suggestion window 404. By means of the pointer 408 the user can select one 

20 of the labels 412, 414, 416 provided by the label list 410 to replace the label 406 in the 
given text. When none of the labels 406, 412, 414, 416 matches the user's preferences, 
the user can enter a label by making use of the user input field 418. Once an alternative 
label has been selected or entered by the user, the label 406 is replaced by the 
alternative label. In a similar way the segmentation of the text with alternative text 

25 segmentations in the form of alternative section boundaries is provided to the user and 
can be performed upon a user's selection. 

Figure 5 shows a block diagram of a segmentation system of the present 
invention. The segmentation system 500 comprises a graphical user interface 520, a 
structured text module 518 for storing structured text, a processing unit 516, a statistical 

30 model module 514 storing statistical models, an unstructured text module 512 storing 
unstructured text and a speech recognition module 510 performing speech to text 
transcription. The segmentation system 500 is connected to an external storage device 
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508 and to an input device 504. A user 506 can interact with the segmentation system 
via the input device 504 and the graphical user interface 520 of the segmentation 
system 500. 

Speech 502 that is inputted into the segmentation system is processed by 
5 the speech recognition module 510. The speech recognition module 5 1 0 is connected to 
the unstructured text module 512 where the unstructured text resulting from the speech 
to text transcription process is stored. The unstructured text module 512 is connected to 
the processing unit 5 1 6 in order to provide the unstructured text to the processing unit 
516. The processing unit 5 1 6 is bidirectionally connected to the statistical model 

10 module 514. By making use of the statistical information provided by the statistical 
models stored in the statistical model module 514, the processing unit 516 performs a 
text segmentation and label assignment to sections of the text on the basis of the 
unstructured text provided by the unstructured text module 512. The speech recognition 
module makes further use of the language models stored and provided by the statistical 

15 model module. In this way the statistical model module provides language models for • . 
the text segmentation as well as language models for the speech recognition. The latter 
are typically of a different type compared to language models for text segmentation 
because speech recognition usually makes use of trigrams whereas text segmentation .» 
usually employs unigrams. 

20 When the processing unit 516 has performed a text segmentation and an 

assignment of labels to text sections as section headings, the so generated structured 
text is stored in the structured text module 518. The structured text module is connected 
to the graphical user interface 520 in order to provide the structured text stored in the 
structured text module 518 to the user 506 by means of the graphical user interface 520. 

25 The user 506 can interact via the input device 504 with the segmentation system. 

Therefore the input device 504 is connected to the graphical user interface 520 and to 
the processing unit 516. When the user 506 introduces modifications of either the text 
structuring or the label assignment, the processing unit 516 performs a restructuring and 
a reassignment of the structured text stored in the structured text module 518. The 

30 restructured and reassigned structured text is repeatedly provided to the user until the 
performed modifications match the user's preferences. When no further changes are 
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introduced by the user the structured text stored in the structured text module 5 1 8 is 

transmitted to the external storage device 508. 

Furthermore structured text stored in the structured text module 518 can 

also be exploited for improved speech recognition that is performed by means of the 
5 speech recognition module 510. Therefore, the structured text module 518 is directly 

connected to the speech recognition module 510. Making use of this context specific 

feedback allows a more precise and specific speech recognition procedure to be 

performed by the speech recognition module 510. 

The invention therefore provides a method of document structuring and 
10 assigning of labels to text sections serving as section headings. Especially in the field of 

automatic speech recognition and automatic speech transcription the proofreading task 

to be performed by a human proof reader is extremely facilitated. For the proposed 

segmentation of the text, it is much easier for the proof reader to check whether the text 

following some heading really represents a section of the corresponding type as 
1 5 opposed to conventional proof reading procedures where a portion of text has to be 

examined, a section has to be determined and a heading has to be inserted into the text 

by jumping back to the beginning of a section. 

Furthermore the method supplies alternative section boundaries as well 

as alternative section labels that can easily be selected by the proof reader. Moreover 
20 during a proof reading process the system learns the most frequent corrections 

introduced by the proof reader and makes use of this information for future 

applications. 
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